{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![umap in atlas](https://docs.nomic.ai/img/umap-with-nomic-atlas.png)\n",
    "\n",
    "# UMAP of Text Embeddings with Nomic Atlas\n",
    "\n",
    "UMAP is available as a projection in Nomic Atlas, which creates interactive maps of your data with AI analysis, vector search APIs, and additional resources like duplicate detection and topic label generation.\n",
    "\n",
    "![text embeddings in atlas](https://assets.nomicatlas.com/airline-reviews-umap.gif)\n",
    "\n",
    "Nomic Atlas automatically generates embeddings for your data and allows you to explore large datasets in a web browser. Atlas provides:\n",
    "\n",
    "* In-browser analysis of your UMAP data with the [Atlas Analyst](https://docs.nomic.ai/atlas/data-maps/atlas-analyst)\n",
    "* Vector search over your UMAP data using the [Nomic API](https://docs.nomic.ai/atlas/data-maps/guides/vector-search-over-your-data)\n",
    "* Interactive features like zooming, recoloring, searching, and filtering in the [Nomic Atlas data map](https://docs.nomic.ai/atlas/data-maps/controls)\n",
    "* Scalability for millions of data points\n",
    "* Rich information display on hover\n",
    "* Shareable UMAPs via URL links to your embeddings and data maps in Atlas\n",
    "\n",
    "This example demonstrates how to use [Atlas](https://docs.nomic.ai/atlas/embeddings-and-retrieval/guides/using-umap-with-atlas) to create interactive maps of text using embeddings and UMAP."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "1. Get the required python packages with `pip instll nomic pandas`\n",
    "\n",
    "2. Get A Nomic API key [here](https://atlas.nomic.ai/cli-login)\n",
    "\n",
    "3. Run `nomic login nk-...` in a terminal window or\n",
    "\n",
    "```python\n",
    "import nomic\n",
    "nomic.login('nk-...')\n",
    "```\n",
    "\n",
    "at the top of your code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>published_date</th>\n",
       "      <th>published_platform</th>\n",
       "      <th>rating</th>\n",
       "      <th>type</th>\n",
       "      <th>text</th>\n",
       "      <th>title</th>\n",
       "      <th>helpful_votes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2024-03-12T14:41:14-04:00</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>1</td>\n",
       "      <td>review</td>\n",
       "      <td>We used this airline to go from Singapore to L...</td>\n",
       "      <td>Ok</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2024-03-11T19:39:13-04:00</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>2</td>\n",
       "      <td>review</td>\n",
       "      <td>The service on Singapore Airlines Suites Class...</td>\n",
       "      <td>The service in Suites Class makes one feel lik...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2024-03-11T12:20:23-04:00</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>0</td>\n",
       "      <td>review</td>\n",
       "      <td>Booked, paid and received email confirmation f...</td>\n",
       "      <td>Don’t give them your money</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2024-03-11T07:12:27-04:00</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>2</td>\n",
       "      <td>review</td>\n",
       "      <td>Best airline in the world, seats, food, servic...</td>\n",
       "      <td>Best Airline in the World</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2024-03-10T05:34:18-04:00</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>0</td>\n",
       "      <td>review</td>\n",
       "      <td>Premium Economy Seating on Singapore Airlines ...</td>\n",
       "      <td>Premium Economy Seating on Singapore Airlines ...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              published_date published_platform  rating    type  \\\n",
       "0  2024-03-12T14:41:14-04:00            Desktop       1  review   \n",
       "1  2024-03-11T19:39:13-04:00            Desktop       2  review   \n",
       "2  2024-03-11T12:20:23-04:00            Desktop       0  review   \n",
       "3  2024-03-11T07:12:27-04:00            Desktop       2  review   \n",
       "4  2024-03-10T05:34:18-04:00            Desktop       0  review   \n",
       "\n",
       "                                                text  \\\n",
       "0  We used this airline to go from Singapore to L...   \n",
       "1  The service on Singapore Airlines Suites Class...   \n",
       "2  Booked, paid and received email confirmation f...   \n",
       "3  Best airline in the world, seats, food, servic...   \n",
       "4  Premium Economy Seating on Singapore Airlines ...   \n",
       "\n",
       "                                               title  helpful_votes  \n",
       "0                                                 Ok              0  \n",
       "1  The service in Suites Class makes one feel lik...              0  \n",
       "2                         Don’t give them your money              0  \n",
       "3                          Best Airline in the World              0  \n",
       "4  Premium Economy Seating on Singapore Airlines ...              0  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Example data\n",
    "df = pd.read_csv(\"https://docs.nomic.ai/singapore_airlines_reviews.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Atlas Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-05-12 02:42:13.955\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36mnomic.dataset\u001b[0m:\u001b[36m_create_project\u001b[0m:\u001b[36m860\u001b[0m - \u001b[1mOrganization name: `nomic`\u001b[0m\n",
      "\u001b[32m2025-05-12 02:42:14.453\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36mnomic.dataset\u001b[0m:\u001b[36m_create_project\u001b[0m:\u001b[36m878\u001b[0m - \u001b[1mCreating dataset `airline-reviews-data`\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "from nomic import AtlasDataset\n",
    "dataset = AtlasDataset(\"airline-reviews-data\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Upload to Atlas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-05-12 02:42:53.678\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36mnomic.dataset\u001b[0m:\u001b[36m_validate_and_correct_arrow_upload\u001b[0m:\u001b[36m348\u001b[0m - \u001b[33m\u001b[1mReplacing 1 null values for field title with string 'null'. This behavior will change in a future version.\u001b[0m\n",
      "100%|██████████| 2/2 [00:01<00:00,  1.61it/s]\n",
      "\u001b[32m2025-05-12 02:42:54.922\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36mnomic.dataset\u001b[0m:\u001b[36m_add_data\u001b[0m:\u001b[36m1714\u001b[0m - \u001b[1mUpload succeeded.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "dataset.add_data(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Data Map"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We specify the `text` field from `df` as the field to create embeddings from. We choose some standard UMAP parameters as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m2025-05-12 02:45:19.821\u001b[0m | \u001b[1mINFO    \u001b[0m | \u001b[36mnomic.dataset\u001b[0m:\u001b[36mcreate_index\u001b[0m:\u001b[36m1273\u001b[0m - \u001b[1mCreated map `0196c33d-a446-6784-d95c-423621815a5e` in dataset `nomic/airline-reviews-data`: https://atlas.nomic.ai/data/nomic/airline-reviews-data\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "from nomic.data_inference import ProjectionOptions\n",
    "\n",
    "atlas_map = dataset.create_index(\n",
    "    indexed_field='text',\n",
    "    projection=ProjectionOptions(\n",
    "      model=\"umap\",\n",
    "      n_neighbors=20,\n",
    "      min_dist=0.01,\n",
    "      n_epochs=200\n",
    "  )\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "atlas",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
